THCHS-30 : A Free Chinese Speech Corpus
نویسندگان
چکیده
Speech data is crucially important for speech recognition research. There are quite some speech databases that can be purchased at prices that are reasonable for most research institutes. However, for young people who just start research activities or those who just gain initial interest in this direction, the cost for data is still an annoying barrier. We support the ‘free data’ movement in speech recognition: research institutes (particularly supported by public funds) publish their data freely so that new researchers can obtain sufficient data to kick off their career. In this paper, we follow this trend and release a free Chinese speech database THCHS-30 that can be used to build a full-fledged Chinese speech recognition system. We report the baseline system established with this database, including the performance under highly noisy conditions.
منابع مشابه
A Super Phonetic System and Multi-dialect Chinese Speech Corpus for Speech Recognition
In this paper, we describe the work on Chinese multi-dialect speech processing. Based on the phonetic analysis of ten Chinese dialects, we have created a Chinese super phonetic system for the Chinese speech recognition. To exam this phonetic system and develop Chinese dialect speech technology, we are building a multi-dialect speech corpus, which includes 10 dialect areas and 2000 speakers.
متن کاملThe Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners
This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...
متن کاملThe Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners
This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...
متن کاملPCFG Parsing for Restricted Classical Chinese Texts
The Probabilistic Context-Free Grammar (PCFG) model is widely used for parsing natural languages, including Modern Chinese. But for Classical Chinese, the computer processing is just commencing. Our previous study on the part-of-speech (POS) tagging of Classical Chinese is a pioneering work in this area. Now in this paper, we move on to the PCFG parsing of Classical Chinese texts. We continue t...
متن کاملConstruction and evaluations of an annotated Chinese conversational corpus in travel domain for the language model of speech recognition
In this paper we describe the development of an annotated Chinese conversational textual corpus for speech recognition in a speech-to-speech translation system in the travel domain. A total of 515,000 manually checked utterances were constructed, which provided a 3.5 million word Chinese corpus with word segmentation and part-of-speech tagging. The annotation is conducted with careful manual ch...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1512.01882 شماره
صفحات -
تاریخ انتشار 2015